Graph generative models from information theory

نویسنده

  • Lin Han
چکیده

Generative models are commonly used in statistical pattern recognition to describe the probability distributions of patterns in a vector space. In recent years, sustained by the wide range of mathematical tools available in vector space, many algorithms for constructing generative models have been developed. Compared with the advanced development of the generative model for vectors, the development of a generative model for graphs has had less progress. In this thesis, we aim to solve the problem of constructing the generative model for graphs using information theory. Given a set of sample graphs, the generative model for the graphs we aim to construct should be able to not only capture the structural variation of the sample graphs, but to also allow new graphs which share similar properties with the original graphs to be generated. In this thesis, we pose the problem of constructing a generative model for graphs as that of constructing a supergraph structure for the graphs. In Chapter 3, we describe a method of constructing a supergraph-based generative model given a set of sample graphs. By adopting the a posteriori probability developed in a graph matching problem, we obtain a probabilistic framework which measures the likelihood of the sample graphs, given the structure of the supergraph and the correspondence information between the nodes of the sample graphs and those of the supergraph. The supergraph we aim to obtain is one which maximizes the likelihood of the sample graphs. The supergraph is represented here by its adjacency matrix, and we develop a variant of the EM algorithm to locate the adjacency matrix that maximizes the likelihood of the sample graphs. Experimental evaluations demonstrate that the constructed supergraph performs well on classifying graphs. In Chapter 4, we aim to develop graph characterizations that can be used to measure the complexity of graphs. The first graph characterization developed is the von Neumann entropy of a graph associated with its normalized Laplacian matrix. This graph characterization is defined by the eigenvalues of the normalized Laplacian matrix, therefore it is a member of the graph invariant characterization. By applying some transformations, we also develop a simplified form of the von Neumann entropy, which can be expressed in terms of the node degree statistics of the graphs. Experimental results reveal that effectiveness of the two graph characterizations. Our third contribution is presented in Chapter 5, where we use the graph characterization developed in Chapter 4 to measure the supergraph complexity and we develop a novel framework for learning a supergraph using the minimum description length criterion. We combine the Jensen-Shanon kernel with our supergraph construction and this provides us with a way of measuring graph similarity. Moreover, we also develop a method of sampling new graphs from the supergraph. The supergraph we present in this chapter is a generative model which can fulfil the tasks of graph classification, graph clustering, and of generating new graphs. We experiment with both the COIL and “Toy” datasets to illustrate the utility of our generative model. Finally, in Chapter 6, we propose a method of selecting prototype graphs of the most appropriate size from candidate prototypes. The method works by partitioning the sample graphs into two parts and approximating their hypothesis space using the partition functions. From the partition functions, the mutual information between the two sets is defined. The prototype which gives the highest mutual information is selected.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

FENZI, OSTERMANN: EMBEDDING GEOMETRY IN CLASS GENERATIVE MODELS 1 Embedding Geometry in Generative Models for Pose Estimation of Object Categories

Regression-based models built on local gradient-based feature descriptors have showed to be successful for continuous pose estimation of object categories. Nonetheless, a crucial weakness of these methods is that no geometric information is taken into account. Therefore, geometrically inconsistent poses may be preferred, and this forces to employ a coarse-grained pose estimator as a pre-process...

متن کامل

ON THE MATCHING NUMBER OF AN UNCERTAIN GRAPH

Uncertain graphs are employed to describe graph models with indeterministicinformation that produced by human beings. This paper aims to study themaximum matching problem in uncertain graphs.The number of edges of a maximum matching in a graph is called matching numberof the graph. Due to the existence of uncertain edges, the matching number of an uncertain graph is essentially an uncertain var...

متن کامل

Why Steiner-tree type algorithms work for community detection

We consider the problem of reconstructing a specific connected community S ⊂ V in a graph G = (V,E), where each node v is associated with a signal whose strength grows with the likelihood that v belongs to S. This problem appears in social or protein interaction network, the latter also referred to as the signaling pathway reconstruction problem (Bailly-Bechet et al., 2011). We study this commu...

متن کامل

MMDS 2008: Algorithmic and Statistical Challenges in Modern Large-Scale Data Analysis, Part II

Algorithmic Approaches to Networked Data In an algorithmic perspective on improved models for data, Milena Mihail of the Georgia Institute of Technology began by describing the recent development of a rich theory of power-law random graphs, i.e., graphs that are random conditioned on a specified input power-law degree distribution. With the increasingly wide range of large-scale social and info...

متن کامل

Improvement of generative adversarial networks for automatic text-to-image generation

This research is related to the use of deep learning tools and image processing technology in the automatic generation of images from text. Previous researches have used one sentence to produce images. In this research, a memory-based hierarchical model is presented that uses three different descriptions that are presented in the form of sentences to produce and improve the image. The proposed ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012